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We model the competition between homologous recombination and point mutation in microbial 
genomes, and present evidence for two distinct phases, one uniform, the other genetically diverse. 
Depending on the specifics of homologous recombination, we find that global sequence divergence 
can be mediated by fronts propagating along the genome, whose characteristic signature on genome 
structure is elucidated, and apparently observed in closely-related Bacillus strains. Front propaga- 
tion provides an emergent, generic mechanism for microbial "speciation" , and suggests a classifica- 
tion of microorganisms on the basis of their propensity to support propagating fronts. 



The transfer of genetic material between microbial cells 
plays a crucial role in their evolution, and poses fun- 
damental questions to microbiology. Is there a tree of 
life for microbes P, 0, Q'^ ^'"^ there bacterial species 
0, 13 ? What are the mechanisms driving their diversi- 
fication 0, S II 0? These questions arise because ge- 
netic transfer couples the evolution of different genomes 
in a way that not only complicates their dynamics but 
obscures their very identity over time: the evolution is 
communal. While in sexual organisms the communality 
of genome evolution is restricted to species, the major el- 
ements of microbial evolution — genetic transfer followed 
by illegitimate or homologous recombination, point mu- 
tations, genome rearrangements — do not a priori imply 
sharp genetic isolation boundaries. If there are none, no- 
tions such as species and speciation, despite being widely 
used heuristically, are misleading. Also, it is not clear 
whether there are classes of microbes with qualitatively 
different modes of communal evolution and what are the 
cellular properties that distinguish between them. 

Gene transfer results when foreign DNA is taken up 
from the environment (transformation), delivered by a 
virus (transduction) or acquired via a direct cell to cell 
exchange (conjugation), and then permanently incorpo- 
rated in the recipient genome by homologous or illegiti- 
mate recombination. Homologous recombination, medi- 
ated by dedicated cellular machinery, plays a vital error 
correction role in genome replication |£| but also allows 
a foreign DNA fragment to replace a sufficiently similar 
portion of the recipient genome. The probability of suc- 
cessful replacement in homologous recombination is pro- 
portional to the exponential of the number of sequence 
mismatches [To|. the mechanism being organism-specific 
[TH IT^ IT^ . Illegitimate recombination can be mediated 
by bacteriophage integrases, selfish genetic elements, or 
occur by chance DNA breakage and repair, and allows the 
acquisition of entirely novel traits from evolutionary dis- 
tant organisms. Illegitimate genetic transfer, also known 
as horizontal gene transfer (HGT), can be inferred from 
the genome data through its atypical sequence composi- 
tion '0| and the phylogcnetic incongruences it causes_[l4| . 
While the extent of HGT is under heated debate Q], it 
is clear that it is much less frequent than homologous 
recombination. Relative rates of homologous recombi- 



nation and point mutations in natural populations have 
been estimated by sequence diversity studies using multi- 
locus sequence typing data in recently-formed bacterial 
strains 0, 0| . The probability that a gene changes as a 
result of homologous recombination can be many times 
higher than that for point mutations. Another manifesta- 
tion of the pervasiveness of homologous recombination is 
that the evolution of strains within many named species 
cannot be represented by a phylogenetic tree 0, 0, 0| . 
While the importance of genetic transfer, and homolo- 
gous recombination in particular, is firmly established 
[20j, there are only a few sharp predictions about the 
resulting modes of microbial evolution. Relevant to our 
work is the observation of Lawrence ^ that HGT islands 
locally inhibit recombination. He concludes that global 
genetic isolation can be achieved through the gradual ac- 
cumulation of hundreds of HGTs. 

The purpose of this paper is to explore the emergent 
properties of the collective evolution of closely related 
bacterial genomes. We model the interplay of homol- 
ogous recombination and point mutations in bacterial 
populations, and show that elementary genome changes 
such as HGT, genome rearrangements, insertions or dele- 
tions can trigger diversification fronts that in evolution- 
ary short time propagate along the bacterial genomes 
and eventually lead to global sequence divergence of sub- 
populations. The diversification fronts can occur even 
in the absence of natural selection and demonstrate that 
fast neutral evolution can have non-trivial long-term evo- 
lutionary consequences. The robustness of this mecha- 
nism is sensitive to some of the details of homologous 
recombination, and suggests a way to classify the spec- 
trum of evolutionary modes in bacteria based on specific 
details of their homologous recombination mechanisms. 
We establish a methodology for analyzing closely related 
genomes and give evidence for a large-scale step-like vari- 
ation of homologous recombination rates in the Bacillus 
cereus group, which might be a signature of a diversifica- 
tion front. Finally, we discuss the biological implications 
of the propagation of diversification fronts, as a mech- 
anism for speciation, a force favoring the formation of 
sharp genetic isolation boundaries, and a dynamical bar- 
rier for HGT and genome rearrangements. 

The details of homologous recombination are by now 
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reasonably well understood 0, ^| . There are at least 
two common obstacles to successful integration of a DNA 
fragment. First, the end of the fragment must find a 
short region (« 20 bp) of sequence identity with the 
target genome in order to initiate the process. Second, 
the cell's mismatch repair system can abort the recombi- 
nation process if it encounters mismatches between the 
fragment and the portion of the genome being replaced. 
Both of these obstacles lead to an exponential decrease 
of recombination with sequence divergence. There are 
also potentially important variations in the mechanism. 
While in E. coli sequence identity at only one end is re- 
quired, in Bacillus very high sequence similarity at both 
ends is needed and mismatch repair seems less im- 

portant. In Streptococcus the effect of mismatch repair is 
intermediate in strength (13j but the overall dependance 
of sexual isolation on sequence divergence is very close 
to that in Bacillus. In addition, the underlying bases 
for distinguishing between donor and recipient DNA can 
differ. Do these differences in the details translate into 
qualitatively different evolutionary behavior? If so, then 
the details of the homologous recombination mechanism 
could be an important criterion for classifying bacteria. 
The computational studies described here clarify which 
details are the relevant determinants of the long-term 
evolutionary dynamics. 

Models:- Based on the above considerations, we construct 
sets of model rules that describe the interplay between 
homologous recombination and point mutations. 

1. There are N circular strings of length L written in 
an alphabet of n symbols. 

2. Each position in each genome is subject to point 
mutations with rate m. A point mutation changes a 
symbol to any other symbol with equal probability. 

3. Each genome receives fragments at an average rate 
r. Each fragment is of size F, is derived from an 
arbitrary position from an arbitrary donor genome 
and attempts to recombine at the same genome po- 
sition in the recipient. 

4. To be considered for incorporation the fragment 
must find an identical segment of length M at an 
arbitrary chosen end (Model I) or at both ends 
(Model II). 

5. The probability of incorporation is exp(--arf), 
where a is a coefficient expressing the strength of 
the mismatch repair system and d is the pointwise 
sequence difference, i.e. d counts the number of 
mismatches between the fragment and the genome 
sequence it is about to replace. We will also con- 
sider Model III, where rule 4 is absent. 

The genome strings can be thought of as representatives 
of different strains possessing at least partial ecologi- 
cal distinctiveness, so that random genetic drift is much 
stronger within strains than between strains. With this 
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FIG. 1: Schematic illustrating the process by which a diversi- 
fication front propagates along a genome in a selection neutral 
situation. In the vicinity of the HGT island, recombination is 
suppressed relative to point mutations, allowing point differ- 
ences to build up in the region flanking the HGT island. The 
newly accumulated sequence differences lead to the extension 
of the region where recombination is inhibited and, in turn, 
an accumulation of point differences further away from the 
HGT island. The process repeats itself. 



interpretation we do not include random genetic drift but 
it can be straightforwardly added. 

Propagation of diversification fronts:- In these models, 
mutation and recombination play opposing roles: point 
mutations generate sequence diversity in the population, 
whereas recombination tends to make sequences more 
similar. At high recombination rates an initially uniform 
population will remain close to uniform; at high muta- 
tion rates all sequences will diverge from each other. An 
important property of homologous recombination is that 
the probability that a recombination event is successful 
decreases with sequence divergence and becomes negligi- 
ble even for small levels of divergence ■ 

These considerations suggest that the uniform phase 
is metastable: even when recombination is strong enough 
to maintain a state of near uniformity it will not succeed 
in bringing together sufficiently diverged sequences. The 
diverged phase on the other hand is stable. If there is a 
boundary between a stable and a metastable phase the 
generic expectation is that the stable phase will grow at 
the expense of the metastable one, as shown in Fig. ^ 
This will happen because homologous recombination is 
inhibited not only in the diverged phase but also in a 
finite region flanking it within the uniform phase. Mu- 
tations will accumulate in the flanking region, and as a 
result the diverged phase will grow. We will refer to the 
boundary between the uniform and diverged phases as 
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a diversification front. Therefore, the system has the 
potential to sustain the propagation of diversification 
fronts. Such diversification fronts can be nucleated by 
processes that create regions of sequence difference be- 
tween genomes in the population, such as HGT, genome 
rearrangements, deletions or insertions and have impor- 
tant biological consequences for the evolution and diver- 
sification of microbes, as will be discussed later. 

Simulations:- To clarify this intuition we performed a se- 
ries of simulations of a population of interacting genomes, 
starting from two different initial conditions: 1) all se- 
quences are the same, and 2) all sequences are the same 
except for a strip, long compared with the typical size of 
recombining fragments, in which the sequences are ran- 
dom. We used three different models for the rules gov- 
erning the dynamical behavior of homologous recombina- 
tion: Model I, requiring sequence identity at one end of 
the recombining fragment; Model II, requiring sequence 
identity at both ends; and Model III, with no require- 
ment of sequence identity. The central questions ad- 
dressed are: Under what circumstances is there a well 
defined front propagation region; is it readily observable 
or is fine tuning of the parameters required? Do the three 
models differ qualitatively? To address these questions 
in a quantitative manner, we define an order parameter 

^(a;) = ^! ^ Y(1-Sa A ) (1) 

where A^i denotes the letter at position x of genome i. 
The order parameter ip measures the average difference 
in the population between the sequences at genome po- 
sition X normalized so that = I when the genomes are 
uncorrelated. This corresponds to the diverged phase of 
the system. In the opposite limit, — 0, the genomes 
in the system are highly correlated, giving rise to the 
uniform phase of the system. 

For each model, we studied the time evolution of the 
order parameter for different values of m/r and a. Typ- 
ical values used for the other parameters are F — 500, 
M = 10, L = 10000, N = 20, n ^ 2. For each separate 
run we measured as a function of position within the 
genome and time. By varying a, we control the strength 
of the mismatch repair mechanism, and hence the success 
rate of recombination. The most important trend probed 
by our simulations is the behavior of the order parameter 
as a function of the ratio /x = m/r, the relative strength 
of point mutations versus recombination. 

Results for Models I and III:- For sufficiently low values 
of a, the equilibrium value of the order parameter varies 
gradually with ^ — m/r, as shown in Fig. |2] The uniform 
and random strip initial conditions always relax to the 
same final state. The random strip simply dissolves and 
no front propagation is observed. This situation arises 
when recombination is allowed almost regardless of the 
degree of sequence divergence. 

Above a threshold value of a, the uniform and diverged 
phases become distinct: for small values of /i, the order 




FIG. 2: The equilibrium value of the order parameter changes 
gradually with m/r in Model I with a = 0, f = 500, M = 10, 
L = 10000, = 20 and n = 2. The inset figure depicts a typi- 
cal time evolution of the genome population. The vertical axis 
represents position along the genome, the colorscale indicat- 
ing the value of the order parameter (blue denoting uniform 
phase, red denoting diverged phase), while the horizontal axis 
is simulation time. A random strip dissolves without trigger- 
ing a diversification front. 

parameter is 0, and the system is genetically uniform. 
However, for large values of fi, the order parameter is 
close to unity, indicating that the system is genetically 
diverged. This transition appears to be sharp, as shown 
in Fig. 121 There is further interesting dynamical behav- 
ior as a function of /x. For /j, > Hu the uniform phase 
becomes unstable and the sequences diverge everywhere 
simultaneously. For ^ < fi^, the uniform phase is stable, 
and a finite region of diverged phase shrinks as a function 
of time, i.e. the uniform phase invades the diverged one. 
For /is < /i < /itj, diversification proceeds through nucle- 
ation and growth of the diverged phase; in this parameter 
range, front propagation occurs. 

From this behavior, we deduce the qualitative phase di- 
agram presented in Fig. Model III, with no sequence 
identity requirement, shows qualitatively similar results 
(data not shown). 

Results for Model II:- For Model II, with sequence iden- 
tity requirement at both ends, we observe front propa- 
gation even for a = 0. Moreover, the width w = fiu/ 
of the interval /is < /i < /x^, where front propagation 
occurs, is very wide. While for Models I and III we al- 
ways observed w < 2, for Model II we could not even 
observe the point /x„, and w > 100. This results in the 
phase diagram qualitatively represented on Fig. 21d. The 
front speed can be as high as several times the fragment 
size per average point mutation time near the transition, 
and is a rapidly decreasing function of the recombination 
rate. 

To summarize, there is a qualitative difference be- 
tween the situation with no sequence identity require- 
ment (Model III) or sequence identity requirement at 
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FIG. 3: Starting from a uniform state, the order parameter 
equilibrates to values close to or 1 in Model I with a = 0.4, 
F = 500, M = 10, L = 10000, iV = 20 and n = 2, indicating 
the existence of distinct uniform and diverged phases. The 
inset figures depict the genome population for the indicated 
value of m/r, as a function of time. The vertical axis repre- 
sents position along the genome, the colorscale indicating the 
value of the order parameter (blue denoting uniform phase, 
red denoting diverged phase), while the horizontal axis is sim- 
ulation time. For < /i < /x„ the random strip triggers a 
diversification front. For fi close to spontaneous nucleation 
is possible. 



only one end (Model I) and Model II with sequence iden- 
tity requirement at both ends. The difference is mani- 
fested in the phase diagram and the width of the front 
propagation region. 

Microbe classification:- These theoretical predictions im- 
ply that we can classify microbial genomes according 
to the details of the recombination dynamics: class I, 
consisting of models I and III, and class II, consisting 
of model II. The distinguishing feature of the classes is 
whether or not the recombination dynamics requires se- 
quence identity at both ends of the incorporated segment. 
For Class II, as long as the uniformity of a population is 
maintained by homologous recombination, it will support 
propagating diversification fronts. For Class I, diversifi- 
cation fronts are possible only within a narrow interval 
of the ratio of mutation to recombination rates and are 
therefore unlikely. 

The existence of class I and class II indicates that the 
details of homologous recombination are important be- 
yond the fact that the probability of recombination expo- 
nentially decreases with sequence divergence. Therefore 
it is necessary to elucidate further the differences between 
homologous recombination mechanisms in different bac- 
teria and work out their consequences for front propa- 
gation. For example, if mismatch repair is nick-directed 
and not methyl-directed ^3 then more mismatches will 
be detected near the ends of the recombining fragments. 
This, in turn, will make front propagation more robust, 
because a greater fraction of the average homogenizing 
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FIG. 4; a. The phase diagram of Models I and III. Distinct 
phases exist only above a threshold value of a and the width 
of the front propagation region, fiu/Hs, is less than 2. b. 
The phase diagram of Model II. Distinct phases exist for all 
values of a and the front propagation region is very wide: 
^JLu/Hs > 100. 



capability of recombination will be inhibited by a phase 
boundary. Also, if non-homologous DNA loops formed 
during the recombination process are not corrected ef- 
ficiently, then small deletions, insertions, slippage and 
inversions would not trigger diversification fronts. Since 
micro rearrangements are presumably frequent, the effi- 
ciency of loop repair will be an important factor in de- 
termining the rate of nucleation of fronts. Finally, it is 
important to know whether or not and how the length of 
the incorporated fragments is dynamically dependant on 
the differences between the donor and recipient. 

In order to seek evidence for the front propaga- 
tion mechanism, we now compare available completely 
sequenced genomes of closely-related microbes. The 
most direct evidence for front propagation from genome 
data alone would be an extended step-like pattern in 
the sequence divergence of closely-related well-aligned 
genomes, with the diverged region centered around a re- 
gion of HGT, deletion or genome rearrangement. The 
front profile reflects the different times after genetic iso- 
lation of different parts of the chromosome. Under con- 
ventional uniform molecular clock assumptions, it will 
be approximately linear, with a slope determined by the 
distance the front travels during the time it takes the se- 
quences to fully diverge once recombination is inhibited. 
Slowly changing components of the sequence divergence, 
such as non-synonymous substitutions, leads to more ex- 
tended profiles. 

Analysis of genome data:- We consider the sequenced 
genomes in the genus Bacillus. It is in Bacillus that 
Majewski and Cohan 12] discovered the requirement for 
sequence identity at both ends, and our simulations in- 
dicate that front propagation is more likely to occur in 
such systems. 

We obtained the complete genome sequences from the 
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FIG. 5: To construct the divergence profiles we first identify 
the well aligned regions (represented by color bars and arrows) 
using MUMMER, then map the differences (represented by 
red circles) onto the reference genome and slide a window of 
width W along the genome. 



NCBI database, together with the positions and orienta- 
tions of the known or predicted protein coding regions, 
tRNAs and rRNAs. We globally aligned all pairs us- 
ing the nucmer script of the MUMMER package 
(nucmer -b 50 -g 300 -c 65 -mum), obtaining a list of 
well aligned regions for each pair. Three Bacillus cereus 
strains - ATCC 10987, ATCC 14579 and ZK 
three Bacillus anthracis strains - Ames, Ames Ancestor 
and Sterne, and Bacillus thuringiensis serovar konkukian 
str. 97-27 genomes were close, highly co-linear and an- 
alyzed further. The three anthracis strains were practi- 
cally identical and only Ames was used in the analysis. 

For each pair, we mapped the well-aligned regions on 
one of the genomes, and constructed a series of coarse- 
grained profiles by sliding a window of width W along 
the genome while excluding non-aligned regions (result- 
ing from insertions and deletions) from the averaging, as 
depicted graphically in Fig. El The profiles have gaps 
where the window covers less than a threshold fraction / 
of fW unambiguously aligned nucleotides. We used W in 
the range of 40k to 120k and / between 0.5 and 0.8. We 
looked at the coarse-grained profiles for the DNA point 
differences, as well as intergene, intragene, 3rd codon, 1st 
and 2nd codon, synonymous and non-synonymous (as de- 
fined in [23|) differences. 

Cereus ATCC 10987 exhibits a distinct step- like pat- 
tern of sequence difference when compared to cereus ZK, 
antracis ames and thuringiensis serovar konkukian str. 
97-27. The pattern is also present in each of the other 
difference components - synonymous, non-synonymous, 
gene, intergene. What is the explanation for this pat- 
tern? Does it involve homologous recombination or not? 
Is it a result of a front propagation during the separa- 
tion of cereus ATCC 10987 with the common ancestor 
of cereus ZK, antracis ames and thuringiensis serovar 



konkukian str. 97-2TI 

To answer these questions, we first examined the vari- 
ation of the nucleotide composition along the genome. 
Based on the GC and AT skews the replication termi- 
nus is located at around 2.6Mb - away from the posi- 
tion of the difference profile step. The GC content varies 
smoothly along the genome and does not exhibit a step 
pattern. It has a minimum near the replication terminus. 

The step pattern is partially correlated with the den- 
sity of protein coding regions in the above genomes, the 
sequence differences being larger where the density is 
lower. However, since all difference components exhibit 
the pattern, it cannot be simply an artifact due to dif- 
ferent proportions of gene and intergene regions with dif- 
ferent mutation rates. Moreover, within the well aligned 
regions, the intergene regions are, on average, only about 
15% more divergent than protein coding regions and the 
gene density varies only in the 75-90% percent range. 
Therefore, the small differences in the proportions of 
sites with different mutation rates would have to have 
been somehow amplified if varying coding density were 
the underlying cause of the pattern. The non-aligned re- 
gions have a higher intergene fraction than aligned ones 
suggesting a possible mechanism by which the density 
of protein coding regions can indirectly affect sequence 
divergence by a preferential accumulation of inter-strain 
alignment gaps in intergene regions and a corresponding 
reduction of recombination rates. 

Could it be that not just the proportion of site types, 
but the point mutation rates themselves vary gradually 
along the genome, leading to the above pattern? To an- 
swer this question, we turn to the distribution of lengths 
of maximal exact matches (DLMEM) between pairs of 
aligned sequences. If differences had accumulated by a 
Poisson mutational process, then we would expect an 
exponential distribution. Recombination, on the other 
hand, will lead to a broader distribution and, for exam- 
ple, a deviation from the Poisson statistics value (unity) 
for the ratio of the standard deviation and the mean |25l | . 

Whether these deviations are statistically significant 
can be determined by comparing with the distribution of 
this ratio for the case without recombination. 

We gathered DLMEM statistics for different well- 
aligned regions. The ratio of the standard deviation 
and mean is significantly above 1, as shown in Fig. |2^). 
Moreover, there is a positive correlation between this ra- 
tio and the length of the uninterrupted well-aligned re- 
gions, a trend which agrees with the notion that non- 
aligned parts inhibit recombination within the adjacent 
aligned regions. 

We then looked for evidence of different rates of homol- 
ogous recombination along the chromosome by studying 
the changes in the DLMEM statistics in a sliding window. 
There is again a step-like pattern for the ratio of the stan- 
dard deviation and the mean, as shown in Fig. iQa). 

Deviation of the ratio of the standard deviation and the 
mean of a DLMEM is a sign of clustering of the differ- 
ences along the chromosome. Are there reasons for clus- 
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FIG. 6: The step-like profile of the sequence difference be- 
tween Bacillus cereus ATCC 10987 and Bacillus cereus ZK 
obtained by sliding a 60k window with / = 2/3 along the 
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FIG. 7; DLMEM statistics resulting from the comparison of 
Bacillus thuringiensis and Bacillus cereus ATCC 10987. a. 
The std/mean for the distribution of lengths of maximal exact 
matches within a well-aligned region is positively correlated 
with the length of the region. The actual data (blue dots) 
is contrasted with a null hypothesis with matched sequence 
difference for each region (red *) b. The std/mean DLMEM 
profile obtained using a 120k window with / = 0.5 along 
Bacillus thuringiensis exhibits a step-like pattern. 



tering which do not involve homologous recombination? 
If different genes have very different evolution rates, then 
this can lead to apparent clustering. For example, dif- 
ferent gene expression levels can lead to different syn- 
onymous mutation rates and an apparent clustering of 
differences within the weakly expressed genes. To con- 
trol for this, we compare the DLMEM for neutral muta- 
tions with a null model with matched neutral divergence 
of each protein coding region separately. The pattern is 
present in the real data but almost completely disappears 
in the control. The residue is due to correlations of the 



divergences of adjacent proteins which are expected in 
the presence of homologous recombination. Since, pre- 
sumably, there is no reason apart for recombination for 
clustering of synonymous substitutions within each gene 
separately, this test not only rules out genes with differ- 
ent evolutionary rates as an explanation but also gives 
confidence that the standard deviation over mean devi- 
ations from unity are predominantly due to homologous 
recombination. 

Further evidence supporting the homologous recombi- 
nation interpretation of the ratio of the standard devia- 
tion and the mean of DLMEM comes from contrasting 
the above observations with the results of the comparison 
between the completely sequenced Buchnera aphidicola 
strains APS, BP and SG. Because, these are intracellu- 
lar parasites lacking the RecA gene we expect no homol- 
ogous recombination. Indeed, we find that there is no 
statistically significant deviation from unity of the stan- 
dard deviation over mean and a highly uniform difference 
profile. 

In summary, the above data indicate that there are 
large-scale step-like variations of the rates of homologous 
recombination along the analyzed microbial genomes, ap- 
parently consistent with the hypothesis that diversifica- 
tion proceeded by front propagation. 

Discussion:- In this section, we discuss the consequences 
of the front propagation mechanism for the fate of bac- 
teria that have acquired useful skills through HGT or 
have undergone a large-scale genome rearrangement. We 
argue that the front propagation mechanism facilitates 
global genetic isolation between strains, and, as such, is a 
mechanism for what may be loosely termed "speciation" . 
On the other hand, the front propagation mechanism re- 
duces the chances that chromosomal changes, such as 
incorporation of HGTs or rearrangements, will be evolu- 
tionary successful, thus creating a dynamical barrier to 
the accumulation of such mutations in evolutionary time. 

A bacterium can acquire a new skill by means of HGT. 
This can lead to the extinction of those bacteria which 
do not possess the beneficial (under appropriate selection 
pressure) HGT fragment. Alternatively, HGT can allow 
the invasion or foundation of a new biochemical niche, 
while being disadvantageous in the former one, or lead 
to specialization within the old niche. (Indeed, ecological 
distinctiveness without spatial isolation is not unusual 
for microbes. Even in the simplest of environments - 
mono culture lab experiments - coexisting strains emerge 
spontaneously l^^. However, the creation of coexisting 
genotypes by HGT cannot properly be termed speciation, 
because the genotypes are not genetically isolated with 
respect to homologous recombination, except for a small 
region surrounding the HGT.) 

The front propagation mechanism makes local isola- 
tion unstable, because the HGT event nucleates a diver- 
sification front leading eventually to a global isolation 
of the carriers of the HGT event from the rest of the 
population. Therefore, ecological distinctiveness accom- 
panied by local isolation is enough to generate speciation, 
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even when homologous recombination is not reduced by 
the ecological distinctiveness. Note that this outcome 
is different from the one proposed by Lawrence who 
suggested that global isolation is only achieved through 
the accumulation of hundreds of HGTs. Our work has 
demonstrated that even a single HGT or genome rear- 
rangement can lead to global sequence divergence. 

It is difficult to apply the biological species concept to 
groups of strains that are isolated at some loci and not at 
others [23| . Because of diversification front propagation, 
a community of bacteria in which pairs of bacteria are ge- 
netically isolated at some loci, but not others, is unstable 
and tends to partition itself into groups which are glob- 
ally isolated from each other with respect to homologous 
recombination. This is because genetically isolated re- 
gions will suppress recombination and trigger fronts into 
neighboring non-isolated regions. This instability will be 
even stronger if the different genomes are not colinear or 
do not have the same set of genes. Therefore, well de- 
fined genetic isolation boundaries emerge spontaneously 
through the front propagation mechanism even if there 
is no functional barrier to gene transfer. 

What happens when a HGT or a rearrangement brings 
some advantage, but without enabling the recipient to 
adopt an entirely distinct ecological role? Achieving com- 
plete ecological distinctiveness might be a gradual pro- 
cess. In this case the new genotype will be successful ini- 
tially but not necessarily in the long run because it will be 
competing with other beneficial mutations at other loci 
that emerge throughout the population. Beneficial mu- 
tations trigger selective sweeps that can be either global, 
purging the diversity throughout some ecological niche 
or, because of homologous recombination, local, purg- 
ing the diversity only around the locus of the beneficial 
mutation. In a population in which relative sequence 
uniformity is maintained by homologous recombination, 
local selective sweeps will be the norm. However, front 
propagation, nucleated in the carriers of a HGT or a re- 
arrangement will propagate by accumulation of neutral 
mutations, and potentially lead to global genetic isolation 
of the carriers long before they have a chance to achieve 
a full ecological distinctiveness. 

New strains are easily formed by readily absorbing for- 
eign genetic material, rearranging the genomes, etc. But 
they are typically short-lived entities, because following 
front propagation they are excluded from the communal 
evolution. Front propagation implies that the evolution- 
ary rate of HGT accumulation is less than the rate sug- 
gested by looking at strains. This can be, in principle, 
tested against the data. This mechanism can also ex- 
plain why gene order is highly conserved in some bacterial 
groups: there exists a dynamical barrier to the survival 
of rearranged genomes. 

These considerations also have implications for the ap- 
plicability of molecular phylogenetics, and the ongoing 
debate about the nature of the impact of HGT on the 
tree of life. Front propagation limits the impact of HGT, 



reinforcing in a complementary way Woese's concept of 
a complexity barrier to HGT Our argument is com- 
plementary, because it does not rely on the nature of the 
interactions between the genes: there is a barrier to HGT 
arising from the population dynamics alone. 

Our work leaves open a number of interesting issues 
related to the effect of highly conserved regions on front 
propagation. A large immutable region can present an 
impassable obstacle to front propagation. Candidates for 
such obstacles are rRNA operons, tRNA genes and over- 
lapping genes. Such regions lack the fiexibility arising 
from the degeneracy of the genetic code. HGTs islands 
inserted near front obstacles will lead to the diversifica- 
tion of a smaller fraction of the recipient genome, and 
have a greater chance to avoid extinction. Is there a cor- 
relation between evolutionary persistent HGTs and RNA 
gene positions? If a genome region is already diversified 
there is no penalty for the incorporation of another useful 
HGT island. Is there clustering of HGT islands? How is 
front propagation modified for clonal bacteria Fi- 
nally, is front propagation beneficial? If front propaga- 
tion obstacles are allowed to evolve or at least reposition 
themselves, what configuration of obstacles would result? 

On the basis of computer simulations, we have sug- 
gested that the interplay between homologous recom- 
bination and point mutations can lead to propagating 
fronts, in whose wake a population of microbes becomes 
genetically diverse in evolutionary short time. Thus, even 
in the absence of selection pressure and ecological barri- 
ers to genetic exchange, gene-exchange boundaries can 
emerge as a statistical consequence of the detailed dy- 
namics of recombination. We have presented a prelimi- 
nary analysis of available genome data for the Bacillus 
cereus group, which is consistent with the presence of 
front propagation. These findings prompt speculations 
about the implications for the evolution and the classifi- 
cation of microbes. 

Our model can be extended in a number of directions, 
including explicit accounting for the role of space, the ex- 
istence of a non-trivial network of gene exchange connec- 
tivity and the effects of sharing of beneficial mutations. 

A promising approach to looking for diversification 
fronts is metagenomics data. Such data can give us a con- 
sensus genome for an ensemble of closely related organ- 
isms, inhabiting the same environment, and an estimate 
for the sequence diversity along the consensus genome 
[28l |. This diversity can be directly related to the order 
parameter i}}{x). A step like variation in ^{x) might be 
an indication of a diversification front. 
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